US State Population vs European Country Population

People often compare the US to European countries. This is not a fair comparison as the population of the US is far greater than any European nation. We would be better off comparing US states to European countries. In this Jupyter Notebook I will create a few visualizations to highlight how US state populations are generally comparable to the populations of European countries.

Choropleth visualization technique

Choropleths are geographic maps that color in deliniated areas based on a specified variable value. They are a great visualization technique that can often be easier to analyze than a classic bar chart. We will utilize choropleths to highlight population similarities between US states and European countries.

Plotly Library

Plotly is in an open source Python plotting library (it is also a library for other programming languages). It can be used to create over 40 chart types. Plotly allows you to create interactive visualizations. The library also has a large user base. We will be utilizing Plotly to display the population similarities between US states and European countries.

Importing libraries

In [1]:
import sys
!{sys.executable} -m pip install plotly
Requirement already satisfied: plotly in /opt/conda/lib/python3.7/site-packages (5.14.1)
Requirement already satisfied: packaging in /opt/conda/lib/python3.7/site-packages (from plotly) (19.0)
Requirement already satisfied: tenacity>=6.2.0 in /opt/conda/lib/python3.7/site-packages (from plotly) (8.2.2)
Requirement already satisfied: pyparsing>=2.0.2 in /opt/conda/lib/python3.7/site-packages (from packaging->plotly) (2.4.2)
Requirement already satisfied: six in /opt/conda/lib/python3.7/site-packages (from packaging->plotly) (1.12.0)
In [2]:
import pandas as pd
import numpy as np
import plotly.express as px

# websites where the data we will be working with originated.
# https://worldpopulationreview.com/states
# https://worldpopulationreview.com/country-rankings/countries-in-europe
# https://worldpopulationreview.com/country-rankings/country-codes

Using Pandas to organise and clean the data

In [3]:
state_pop = pd.read_csv('assets/us_state_pop.csv')

state_pop = state_pop[['State', 'Pop']]
state_pop.rename(columns={'State':'state', 'Pop':'pop'}, inplace = True)
state_pop['state_code'] = ['CA', 'TX', 'FL', 'NY', 'PA', 'IL', 'OH', 'GA', 'NC', 'MI', 'NJ', 'VA', 'WA', 'AZ', 'MA', 'TN', 'IN', 'MO', 'MD', 'WI', 'CO', 'MN', 'SC', 'AL', 'LA', 'KY', 'OR', 'OK', 'CT', 'UT', 'IA', 'NV', 'AR', 'PR', 'MS', 'KS', 'NM', 'NE', 'ID', 'WV', 'HI', 'NH', 'ME', 'MT', 'RI', 'DE', 'SD', 'ND', 'AK', 'DC', 'VT', 'WY']

all_euro_pop = pd.read_csv('assets/all_euro_pop.csv')
country_codes = pd.read_csv('assets/country_codes.csv') # contains country codes
merged_df = country_codes.merge(all_euro_pop, how = 'left', on = ['country', 'country']) # matching country codes to countries
all_euro = merged_df.dropna()
all_euro = all_euro[['cca3', 'pop2020', 'country']]
all_euro['pop2020'] = all_euro['pop2020'] * 1000 # un-scaling the population count
all_euro = all_euro.drop(index=168) # dropping out Russia because it is part of both Europe and Asia

Creating the choropleths with Plotly

In [4]:
#choropleths for US
state_fig = px.choropleth(state_pop, 
                    locations = 'state_code', #https://plotly.github.io/plotly.py-docs/generated/plotly.express.choropleth.html
                    color="pop", # color scale designator
                    hover_name="state", # hover bubble info, shown when you hover over the geographical area
                    locationmode = 'USA-states',
                    color_continuous_scale=px.colors.sequential.solar_r, #https://plotly.com/python/builtin-colorscales/
                    range_color=[0,85000000]) #https://plotly.com/python/colorscales/

state_fig.update_layout(
    title_text = '2020 Population by US State',
    geo_scope='usa') # set world map view location https://plotly.com/python/reference/layout/geo/#layout-geo-scope
In [5]:
#choropleths for Europe
country_fig = px.choropleth(all_euro, 
                    locations = 'cca3', #https://plotly.github.io/plotly.py-docs/generated/plotly.express.choropleth.html
                    color="pop2020", # color scale designator
                    hover_name="country", # hover bubble info
                    color_continuous_scale=px.colors.sequential.solar_r, #https://plotly.com/python/builtin-colorscales/
                    range_color=[0,85000000]) #https://plotly.com/python/colorscales/

country_fig.update_layout(
    title_text = '2020 Population by European Country',
    geo_scope='europe')

country_fig.show()

Displaying the choropleth's closer to eachother

Even the design of plotly's general choropleths highlights how US states are not often compared to European countries. The toolkit does not allow for values to be simultaneously assigned to US states and European countries. So I have to display the two choropleths above eachother.

In [6]:
state_fig.show()
In [7]:
country_fig.show()

Bar chart ordered by population size

This is a bar chart of US state populations and European country populations.

In [8]:
# change the df column titles so we can merge the country and state dfs. And add a column to distinguish countries and states
all_euro = all_euro.rename(columns = {'country' : 'area', 'cca3' : 'code', 'pop2020' : 'pop'})
all_euro['region'] = 'europe'
#Georgia display issue fix
all_euro.loc[all_euro['area'] == 'Georgia', 'area'] = 'Georgia_country'

state_pop = state_pop.rename(columns = {'state':'area', 'state_code' : 'code'})
state_pop['region'] = 'usa'
#Georgia display issue fix
state_pop.loc[state_pop['area'] == 'Georgia', 'area'] = 'Georgia_state'

combined_df = pd.concat([all_euro,state_pop]) # use concat to merge the country and state dfs

combined_df = combined_df.sort_values(by=['pop']) #sort by population size
In [9]:
#display the chart
fig = px.bar(combined_df, x='code', y='pop', hover_name='area', labels={'pop':'Population', 'code': 'Area'}, height=400) #https://plotly.com/python-api-reference/generated/plotly.express.bar.html
fig.show()

Bars colored by state or country

Seperates and colors the two dataframes within the same chart.

In [10]:
fig = px.bar(combined_df, x='code', y='pop', hover_name='area', color='region', labels={'pop':'Population', 'code': 'Area'}, height=400) # set the color
fig.show()

Highlighting US population size vs European countries

This will highlight how much larger the US population is than the population of countries in Europe.

In [11]:
euro_usa = all_euro
usa = {'code':'USA', 'pop':331002651, 'area':'United States of America', 'region':'usa'}
euro_usa = euro_usa.append(usa,ignore_index=True) # add a column for the USA to the all_euro data frame
euro_usa = euro_usa.sort_values(by=['pop'])
In [12]:
fig = px.bar(euro_usa, x='code', y='pop', hover_name='area', color='region', labels={'pop':'Population', 'code': 'Area'}, height=400)
fig.show()

(Optional) Population bar chart seperated by color using matplotlib

Plotly does not allow you to merge bar charts seperated by color, so I created a bar chart in matplotlib just to highlight the population similarities.

In [13]:
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches #needed to insert a legend
%matplotlib inline

fig = plt.figure(figsize=(80,50))
plt.figure(figsize=(80,50))

x = combined_df['area']
y = combined_df['pop']

x_pos = [i for i, _ in enumerate(x)]
plt.xticks(x_pos, x, rotation = 90)

plt.tick_params(axis='both', which='major', labelsize=50)
plt.ylabel("Population in 10M",fontsize=100)
red_patch = mpatches.Patch(color='red', label='US state population') # creates a legend patch set to a specific color https://matplotlib.org/tutorials/intermediate/legend_guide.html
blue_patch = mpatches.Patch(color='blue', label='European country population')
plt.legend(handles=[red_patch, blue_patch],loc='upper left',prop={'size': 90}) # creates the legend and sets it to the upper left corner
chart = plt.bar(x,y)

usa = [8,12,13,14,15,16,17,18,21,22,23,24,25,27,30,33,35,36,37,38,39,41,42,43,47,48,49,50,52,56,58,59,60,61,62,63,65,66,67,68,71,75,80,82,83,85,86,87,91,92,93,95] # manually list US state bar indexes
for x in usa:
    chart[x].set_color('r') # uses a for loop to set US state bars to red  

plt.show()
<Figure size 5760x3600 with 0 Axes>